23 research outputs found
Recommended from our members
Arabic text recognition of printed manuscripts. Efficient recognition of off-line printed Arabic text using Hidden Markov Models, Bigram Statistical Language Model, and post-processing.
Arabic text recognition was not researched as thoroughly as other natural languages. The need for automatic Arabic text recognition is clear. In addition to the traditional applications like postal address reading, check verification in banks, and office automation, there is a large interest in searching scanned documents that are available on the internet and for searching handwritten manuscripts. Other possible applications are building digital libraries, recognizing text on digitized maps, recognizing vehicle license plates, using it as first phase in text readers for visually impaired people and understanding filled forms.
This research work aims to contribute to the current research in the field of optical character recognition (OCR) of printed Arabic text by developing novel techniques and schemes to advance the performance of the state of the art Arabic OCR systems.
Statistical and analytical analysis for Arabic Text was carried out to estimate the probabilities of occurrences of Arabic character for use with Hidden Markov models (HMM) and other techniques.
Since there is no publicly available dataset for printed Arabic text for recognition purposes it was decided to create one. In addition, a minimal Arabic script is proposed. The proposed script contains all basic shapes of Arabic letters. The script provides efficient representation for Arabic text in terms of effort and time.
Based on the success of using HMM for speech and text recognition, the use of HMM for the automatic recognition of Arabic text was investigated. The HMM technique adapts to noise and font variations and does not require word or character segmentation of Arabic line images.
In the feature extraction phase, experiments were conducted with a number of different features to investigate their suitability for HMM. Finally, a novel set of features, which resulted in high recognition rates for different fonts, was selected.
The developed techniques do not need word or character segmentation before the classification phase as segmentation is a byproduct of recognition. This seems to be the most advantageous feature of using HMM for Arabic text as segmentation tends to produce errors which are usually propagated to the classification phase.
Eight different Arabic fonts were used in the classification phase. The recognition rates were in the range from 98% to 99.9% depending on the used fonts. As far as we know, these are new results in their context. Moreover, the proposed technique could be used for other languages. A proof-of-concept experiment was conducted on English characters with a recognition rate of 98.9% using the same HMM setup. The same techniques where conducted on Bangla characters with a recognition rate above 95%.
Moreover, the recognition of printed Arabic text with multi-fonts was also conducted using the same technique. Fonts were categorized into different groups. New high recognition results were achieved.
To enhance the recognition rate further, a post-processing module was developed to correct the OCR output through character level post-processing and word level post-processing. The use of this module increased the accuracy of the recognition rate by more than 1%.King Fahd University of Petroleum and Minerals (KFUPM
TOWARDS AN ARABIC UPPER MODEL: A PROPOSAL
This work introduces the notion of a computational resource for organising knowledge developed for natural language realisation, the Upper Model. The links between the upper model and the domain knowledge from one side and between the upper model and surface realisation from the other side are briefly presented. Systemic functional grammar, a typical grammar to be interfaced to the upper model for surface realisation is discussed. Then, some Arabic characteristics, mainly Arabic grammar, is introduced. A limited number of areas where Arabic and English grammars differ are listed. The need of adapting the current upper model to support natural language generation for Arabic is highlighted along with the need for developing an Arabic systemic grammar. Procedures for future research work in the field are described
Techniques For High Quality Arabic Speech Synthesis
The paper proposes a diphone/sub-syllable method for Arabic Text-to-Speech (ATTS) systems. The proposed approach exploits the particular syllabic structure of the Arabic words. For good quality, the boundaries of the speech segments are chosen to occur only at the sustained portion of vowels. The speech segments consists of consonants-half vowels, half vowel-consonants, half vowels, middle portion of vowels, and suffix consonants. The minimum set consists of about 310 segments for classical Arabic
Techniques For High Quality Arabic Speech Synthesis
The paper proposes a diphone/sub-syllable method for Arabic Text-to-Speech (ATTS) systems. The proposed approach exploits the particular syllabic structure of the Arabic words. For good quality, the boundaries of the speech segments are chosen to occur only at the sustained portion of vowels. The speech segments consists of consonants-half vowels, half vowel-consonants, half vowels, middle portion of vowels, and suffix consonants. The minimum set consists of about 310 segments for classical Arabic
Statistical Analysis of Arabic Text to Support Optical Arabic Text Recognition
ملخص: تقدم هذه الدراسة ملخصا لنتائج دراسة إحصائية لأعداد ظهور حروف ومقاطع الكلمات في اللغة العربية. وتشمل النتائج المعروضة تكرار كل حرف من الحروف العربية في كل مقطع من المقاطع، وتكرار الحرف والحرف الذي يليه في المقاطع المختلفة لكل الحروف. كما تشمل الدراسة على إحصائيات استخدام الحروف والمقاطع ونسبة استخدام كل منها في حالات الاستخدام المختلفة في اللغة العربية. وقد تم تطبيق الدراسة عل كتابي صحيح البخاري ومسلم. وتفيد الدراسة في المساعدة في عملية التعرف الآلي على الكتابة العربية، كما تفيد في عملية تصحيح الأخطاء بعد عملية التعرف
Statistical Analysis of Arabic Text to Support Optical Arabic Text Recognition
ملخص: تقدم هذه الدراسة ملخصا لنتائج دراسة إحصائية لأعداد ظهور حروف ومقاطع الكلمات في اللغة العربية. وتشمل النتائج المعروضة تكرار كل حرف من الحروف العربية في كل مقطع من المقاطع، وتكرار الحرف والحرف الذي يليه في المقاطع المختلفة لكل الحروف. كما تشمل الدراسة على إحصائيات استخدام الحروف والمقاطع ونسبة استخدام كل منها في حالات الاستخدام المختلفة في اللغة العربية. وقد تم تطبيق الدراسة عل كتابي صحيح البخاري ومسلم. وتفيد الدراسة في المساعدة في عملية التعرف الآلي على الكتابة العربية، كما تفيد في عملية تصحيح الأخطاء بعد عملية التعرف
New fault models and efficient BIST algorithms for dual-portmemories
The testability problem of dual-port memories is investigated. A functional model is defined, and architectural modifications to enhance the testability of such chips are described. These modifications allow multiple access of memory cells for increased test speed with minimal overhead on both silicon area and device performance. New fault models are proposed, and efficient O(n) test algorithms are described for both the memory array and the address decoders. The new fault models account for the simultaneous dual-access property of the device. In addition to the classical static neighborhood pattern-sensitive faults, the array test algorithm covers a new class of pattern sensitive faults, duplex dynamic neighborhood pattern-sensitive faults (DDNPSF